Category Enhanced Word Embedding
نویسندگان
چکیده
Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar cooccurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discriminate a word from another. In this paper, we incorporate category information of documents in the learning of word representations and to learn the proposed models in a documentwise manner. Our models outperform several state-of-the-art models in word analogy and word similarity tasks. Moreover, we evaluate the learned word vectors on sentiment analysis and text classification tasks, which shows the superiority of our learned word vectors. We also learn high-quality category embeddings that reflect topical meanings.
منابع مشابه
Using Embedding Masks for Word Categorization
Word embeddings are widely used nowadays for many NLP tasks. They reduce the dimensionality of the vocabulary space, but most importantly they should capture (part of) the meaning of words. The new vector space used by the embeddings allows computation of semantic distances between words, while some word embeddings also permit simple vector operations (e.g. summation, difference) resembling ana...
متن کاملContradiction Detection with Contradiction-Specific Word Embedding
Contradiction detection is a task to recognize contradiction relations between a pair of sentences. Despite the effectiveness of traditional context-based word embedding learning algorithms in many natural language processing tasks, such algorithms are not powerful enough for contradiction detection. Contrasting words such as “overfull” and “empty” are mostly mapped into close vectors in such e...
متن کاملSemantic analysis in word vector spaces with ICA and feature selection
In this article, we test a word vector space model using direct evaluation methods. We show that independent component analysis is able to automatically produce meaningful components that correspond to semantic category labels. We also study the amount of features needed to represent a category using feature selection with syntactic and semantic category test sets.
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملRepresentation Learning for Aspect Category Detection in Online Reviews
User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., “food” and “service” in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage handcrafted features and a classification algorithm to accomplish the t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1511.08629 شماره
صفحات -
تاریخ انتشار 2015